ECE 511 Exam 2
Fall 2005
Tuesday, December 13, 2005
·
You
are allowed to use any notes, books, papers, web sites, or other reference
material as you desire. No interactions with others are allowed.
·
This
exam is designed to take 120 minutes to complete. To allow for any unforeseen
difficulties, you are also allowed a 120-minute automatic extension. You are
allowed to work on the exam for a continuous period of up to 240 minutes (four
hours).
·
This
exam is based on lectures as well as class reading material. Each true/false
question is concerned with one topic we covered in the course.
·
The
questions are randomly selected from the topics we covered this semester.
·
You
can write down the reasoning behind your choice for up to five questions for
possible partial credit. Choose wisely.
·
Please
use the provided plain text template to submit your answers to w-hwu@uiuc.edu
by
·
Good luck!
Part 1 (10 points): This part tests your understanding of
predicated execution. Label each of the following statements as T (true) or F
(false) according to the lectures and class reading.
A. (2 pts)
According to Mahlke, et al., “A comparison between Full and Partial Predication
Support for ILP processors,” the code sequence of a load instruction to be
predicated with partial predication support using conditional move instructions
require control speculative load instructions. This is because of the need to
ignore exceptions caused by the load instruction under execution conditions
where the load instruction should not be executed.
B. (2 pts) According to
Mahlke, et al., “A comparison between Full and Partial Predication Support for
ILP processors,” a basic block with store instructions cannot be predicated
with conditional move instructions. This is because there is not a good way to
nullify the effect of the store instruction under the execution conditions
where the store instruction should not be executed.
C. (2 pts) According to
August, et al., “Integrated Predicated and Speculative Execution in the IMPACT
EPIC Architecture,” predicate promotion allows the removal of predicates from
predicated instructions, thus removing control dependences and reducing
schedule length. This, however, requires control speculation support since the
instructions whose predicates are removed execute more frequently as a
result and are thus in effect speculative.
D. (2 pts) According to
August, et al., “Integrated Predicated and Speculative Execution in the IMPACT
EPIC Architecture,” the IMPACT EPIC Architecture supports an inline selective
recovery model that eliminates the need to generate recovery blocks for control
speculation.
E. (2 pts) According to Rau
and Fisher, “Instruction-Level Parallel Processing – History, Overview,
Perspective,” Lam attempts to achieve a better II than predication by
scheduling each leg of the control construct separately. This achieves a smaller
MII than predicated execution. However, this was later shown by Warter to cause
larger II’s when there is complex resource usage.
Part
2 (10 points): This question tests your understanding of vector
processing
Label each of the following
statements as T (true) or F (false) according to the lectures and class
reading.
A. (2 pts) Vectorization of a
loop involves loop distribution that takes a loop with multiple statements in
the loop body and generate multiple loops that each contain only one statement
in the loop body.
B. (2 pts) A loop cannot be
vectorized if there is a backward loop carried dependence from a statement in
the loop body to another one that appear earlier in the loop body.
C. (2 pts) According to
Russell, “CRAY-1 Computer System,” vector merge and test instructions are
provided in the CRAY-1 to allow operations to be performed on individual vector
register elements designated by the content of the vector mask register.
D. (2 pts) Vector chaining is
a technique that takes advantage of parallel operation of function units by
allowing a vector instruction to receive streamed results from another vector
instruction as these results emerge from a function unit.
E. (2 pts) According to
Russell, “The CRAY-1 Computer System,” the CRAY-1 Supercomputer has a vector
chaining window of one clock cycle.
Part 3 (10
points): This part tests your
understanding of parallel processing fundamentals. Label each of the following
statements as T (true) or F (false) according to the lectures and class
reading.
A. (2 pts) In
the data parallel model, the same computation is applied to parts of a large
data structure.
B. (2
pts) In a shared memory system, the same physical address refers to the same
data for all processor elements in the system.
C. (2
pts) In a distributed memory (message passing) system, the physical address
space of each processor does not need to have to accommodate the data of the
program being executed as long as they can accommodate the portion of the data
assigned to the processor.
D. (2
pts) According to Amdahl’s law, a parallel processing system that executes a
computation with 90% parallel and 10% serial components can achieve no more
than 100 times speedup regardless the number of processors it has.
E. (2
pts) Mutual exclusion is designed for processors to correctly perform
multi-step updates to shared data structure during parallel execution.
Part 4 (10 points): This part tests your understanding of the Illinois
(MESI) cache coherence protocol. Label each of the following statements as T
(true) or F (false) according to the lectures and class reading.
A. (2 pts) The Illinois
protocol defines four possible states for each cache line; two of the states
indicate that the cache line is present in the local cache, not in any other
caches.
B. (2 pts) In an Illinois
protocol system, main memory supplies data only when no cache memory contains
the data.
C. (2 pts) In an Illinois
protocol system, when a cache line is in the “shared” state in the local cache,
it must be present in at least one other cache according to the specification
we gave in the lecture.
D. (2 pts) The Illinois
protocol is a snooping protocol, which requires all processors to monitor all
bus transactions in order to maintain correct state of their cache lines.
E. (2pts) In an Illinois
protocol system, if processor A requests a cache line that is in the “modified”
state in processor B’s cache, the protocol forces a write-back by processor B’s
cache to the main memory.
Part 5 (10 points): This part
tests your understanding of memory consistency models.
Label each of the following
statements as T (true) or F (false) according to the lectures and class
reading.
A. (2 pts) In a sequential
consistency system, a store to a memory location cannot be done until all
previous loads from the same processor have completed.
B. (2 pts) In a processor
consistency system, a load from a memory location does not need to wait until
all previous stores from the same processor have completed.
C. (2 pts) In a weak
consistency system, synchronization instructions must be used to enforce ordering
between loads and stores to different locations by the same processor.
D. (2 pts) The release
consistency model improves upon the weak consistency model by using two
distinct instructions for synchronization: one for acquiring locks and one for
releasing locks. All previous stores from the same processor must complete
before a release instruction can proceed but they do not have to complete
before an acquire instruction can proceed.
E. (2 pts) In the DASH
system, the implementation of the release consistency requires a local
processor to wait until all invalidation requests to remote clusters are
acknowledged before an acquire instruction can proceed.
Part 6 (10
points): This part tests your understanding of directory-based cache coherence
protocols and the DASH case study.
Label each of the following
statements as T (true) or F (false) according to the lectures and class
reading.
A. (2 pts) In a directory
based protocol, one must use invalidation rather than update for coherence
activities.
B. (2 pts) In a directory
based protocol, each processor no longer needs to monitor all memory
transactions made by all processors in the system.
C. (2 pts) In the DASH
directory protocol, a load miss may trigger up to three system-level transactions:
local processor to home cluster, home cluster to remote cluster, remote cluster
to local processor.
D. (2 pts) In the DASH
directory protocol, the directory for each memory line is designed to precisely
track all remote clusters whose processor(s) contain the line in their caches.
E.
(2 pts) The DASH directory protocol is based on the
Part 7 (10
points): This part tests your
understanding of multithreaded architectures.
Label each of the following
statements as T (true) or F (false) according to the lectures and class
reading.
A. (2 pts) In Control Data
6600, the central processing unit is multithreaded into ten logical processors.
B. (2 pts) Multithreading in
Control Data 6600 is realized by dividing up a major cycle into ten minor
cycles. During every major cycle, each of the ten logical processors occupies
one of the ten minor cycles by performing one step of an instruction on its
dynamic information (register contents).
C. (2 pts) Multithreading in
the HEP processor is accomplished by allowing each process to take up one
processor cycle on an alternating basis. By putting enough time separation
between instructions from the same process, the processor does not need to
provide any pipeline interlocking or bypassing logic.
D. (2 pts) According to Marr,
et al. “Hyper-Threading Technology Architecture and Microarchitecture,” the trace
cache does not need to be modified or expanded to accommodate multithreading.
E. (2 pts) According to Marr,
et al. “Hyper-Threading Technology Architecture and Microarchitecture,” the
Xeon implementation allows each logical processor to make progress even though another
might fill up all its allowed buffer due to a very long latency cache miss.
Part 8 (10
points): This part tests your
understanding of virtual machine architectures. Label each of the following
statements as T (true) or F (false) according to the lectures and class
reading.
A. (2 pts) Automatic memory
management with garbage collection eliminates the need for programmers to
explicitly allocate and free heap data.
B. (2 pts) Virtual function
calls and inheritance hierarchy allow programmers to introduce additional functionality
by adding new functions without making changes to the original functions.
C. (2 pts) The “throw” and
“catch” exception handling model allows an exception caused by a function deep
in a function call chain to be not processed or even relayed by intermediate
functions in the chain. The exception can eventually be detected and handled by
a function that is higher than these intermediate functions in the call chain.
D. (2 pts) Java improves
security of application execution by not allowing pointer arithmetic operations
and by performing array bounds checking. This eliminates the possibility for an
application to examine or change the contents of memory locations that it is
not allowed to.
E. (2 pts.) Running each
application on top of its own VM isolates the application from the accidental
and malicious failures of other applications in the system.
Part 9 (10
points): This part tests your
understanding of the special-purpose architectures. The questions are based on
the lecture by Alben from NVIDIA. Label each of the following statements as T
(true) or F (false) according to the lectures and class reading.
A. (2 pts) The NVIDIA GeForce
6800 achieves high compute efficiency by performing all computation in
fixed-point arithmetic.
B. (2 pts) Texture filtering
is performed during Vertex processing where the texture is painted on
triangles.
C. (2 pts) Frame Buffers in
the NVIDIA GeForce 6800 are based on SRAM technology for high-speed data
access.
D. (2 pts) Pixel processors
use vector processing to take advantage of the fact that pixels require similar
computation.
E. (2 pts) The Z-buffer is
used to compute the visibility of objects during rendering.